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You Probably Know 


eMany use cases: 
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+ High performance, low energy consumption 
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NAND Flash Memory Challenges 


— Requires erase before program (write) 
— High raw bit error rate 


Raw Flash 
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ECC Controller 
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Limited Flash Memory Lifetime 


Goal: Extend flash memory lifetime 


at low cost 


P/E Cycle Lifetime A 


Raw bit error rate (RBER) 


Program/Erase (P/E) Cycles 
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Retention Loss 


Charge leakage over time 


Retention «wes 
Flash cell error 


One dominant source of flash 
memory errors [DATE ‘12, ICCD ‘12] 


Before | show you 
how we extend flash lifetime ... 


NAND Flash 101 
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Threshold Voltage (М, ) 


- Flash cell Flash cell - 
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Threshold Voltage (V,,) Distribution 


Probability Density 
Function (PDF) 
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Read Reference Voltage (V,a) 
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Normalized V,, 
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Multi-Level Cell (MLC) 
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Threshold Voltage Reduces Over Time 


After some retention loss: 
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Fixed Read Reference Voltage Becomes Suboptimal 


After some retention loss: 
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Optimal Read Reference Voltage (OPT) 


After some retention loss: 
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Minimal raw bit errors 
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Goal 1: Design a low-cost mechanism that 


dynamically 
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Retention Failure 


After significant retention loss: 
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Uncorrectable errors 
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finds the optimal read reference 
voltage 


recover 
data after detecting uncorrectable errors 
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To understand the effects of retention loss: 


- using real chips 
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To understand the effects of retention loss: 


- Characterize retention loss using real chips 


Characterization Methodology 


mel V EPOA 
(NAND. Controllers) 
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FPGA-based flash memory testing platform [Cai*, FCCM '11] .. 
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Characterization Methodology 


*FPGA-based flash memory testing platform 
* Real 20- to 24-nm MLC NAND flash chips 
*O- to 40-day worth of retention loss 

«Room temperature (20°C) 

*O to 50k P/E Cycles 
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Characterize the effects of retention loss 


1. Threshold Voltage Distribution 


2. Optimal Read Reference Voltage 


3. RBER and P/E Cycle Lifetime 


1. Threshold Voltage (V,,,) Distribution 
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1. Threshold Voltage (V,,) Distribution 
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Probability density function 
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Normalized threshold voltage 
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2. Optimal Read Reference Voltage (OPT) 
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Finding: OPT decreases over time 
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3. RBER and P/E Cycle Lifetime 


RBER 


P/E Cycles 
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3. RBER and P/E Cycle Lifetime 


| | | Ме, Closer to 
Reading data with 7-day worth of retention loss. actual OPT 


Raw Bit Error Rate (RBER/) 
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Characterization Summary 


Due to retention loss 
- Cell's threshold voltage (V,,) decreases over time 


- Optimal read reference voltage (OPT) decreases 
over time 


Using the actual OPT for reading 
- Achieves the longest lifetime 
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Goal 1: Design a low-cost mechanism that 


dynamically finds the optimal read reference 
voltage 


Naive Solution: Sweeping V... 


Key idea: Read the data multiple times with 
different read reference voltages until the raw 
bit errors are correctable by ECC 


Y Finds the optimal read reference voltage 


х Requires many read-retries > higher read 
latency 
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Comparison of Flash Read Techniques 


Flash Read Lifetime Performance 
Techniques (P/E Cycle) (Read Latency) 
Fixed V. x Y 
Sweeping Y x 
ИЕ 


Our Goal У У 
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Observations 


1. The optimal read reference voltage gradually 
decreases over time 


Key idea: Record the old OPT as a prediction (V...) of 
the actual OPT 


Benefit: Close to actual OPT > Fewer read retries 


2. The amount of retention loss is similar across pages 
within a flash block 


Key idea: Record only one V eg for each block 
Benefit: Small storage overhead (768KB out of 512GB) 
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Retention Optimized Reading (ROR) 


Components: 
1. Online pre-optimization algorithm 


- Periodically records a V jeg for each block 


2. Improved read-retry technique 


- Utilizes the recorded Veg to minimize read-retry 
count 
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1. Online Pre-Optimization Algorithm 


eTriggered periodically (e.g., per day) 


• Find and record an OPT as per-block Уо 


ePerformed in background 


eSmall storage overhead 
New Old 
PDF V V 


pred pred 


Normalized V,, 
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2. Improved Read-Retry Technique 


«Performed as normal read 
° Vreg already close to actual OPT 
• Decrease Vies if Ура fails, and retry 
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Very close Normalized V,, 
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Retention Optimized Reading: Summary 


Flash Read Lifetime Performance 
Techniques (P/E Cycle) (Read Latency) 
Fixed V... x у 
Sweeping 
Ver У 64% T X 
Nom. Life: 2.496 V, 
ROK У 64% + Ext. Life: 70.4% 4], 
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Goal 2: Design an offline mechanism to recover 


data after detecting uncorrectable errors 


Retention Failure 
After significant retention loss: 
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Uncorrectable errors 
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Leakage Speed Variation 
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Initially, Right After Programming 
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After Some Retention Loss 


PDF Fast-leaking cells have lower Vy 
Slow-leaking cells have higher V.. 
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Eventually: Retention Failure 
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Retention Failure Recovery (RFR) 


Key idea: Guess original state of the cell from 
its leakage speed property 


Three steps 

1. Identify risky cells 

2. Identify fast-/slow-leaking cells 
3. Guess original states 
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1. Identify Risky Cells______ 
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2. Identifying Fast- vs. Slow-Leaking Cell 
+ I Risky +S= LZ 
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2. Identifying Fast- vs. Slow-Leaking Cell 
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3. Guess Original States 
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Program with 
random data 


Detect failure, 
backup data 


Recover data 
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RFR Evaluation 


• Expect to eliminate 
5096 of raw bit errors 


ФЕСС can correct 
28 days remaining errors 


12 adati. 
days 
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To understand the effects of retention loss: 
- Characterize retention loss using real chips 
Goal 1: Design a low-cost mechanism that 
dynamically finds the optimal read reference 


== 


Goal 2: Design an offline mechanism to recover 
data after detecting uncorrectable errors 
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Conclusion 


Problem: Retention loss reduces flash lifetime 
Overall Goal: Extend flash lifetime at low cost 


Flash Characterization: Developed an understanding 
of the effects of retention loss in real chips 


Retention Optimized Reading: A low-cost mechanism 
that dynamically finds the optimal read reference 
voltage 


- 64% lifetime Т, 70.4% read latency J, 


Retention Failure Recovery: An offline mechanism 
that recovers data after detecting uncorrectable 


errors 
- Raw bit error rate 50% .L, reduces data loss 
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Backup Slides 


RER Motivation 


Data loss can happen in many ways 
1. High P/E cycle 


2. High temperature > accelerates retention 
loss 


3. High retention age (lost power for a long 
time) 
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What if there are other errors? 


Key: RFR does not have to correct all errors 


Example: 
• ЕСС can correct 40 errors in a page 


eCorrupted page has 20 retention errors, 25 
other errors (45 total errors) 


eAfter RFR: 10 retention errors, 30 other errors 
(40 total errors > ECC correctable) 
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